Deep Q-Learning With Q-Matrix Transfer Learning for Novel Fire Evacuation Environment

نویسندگان

چکیده

Deep reinforcement learning (RL) is achieving significant success in various applications like control, robotics, games, resource management, and scheduling. However, the important problem of emergency evacuation, which clearly could benefit from RL, has been largely unaddressed. Indeed, evacuation a complex task that difficult to solve with RL. An situation highly dynamic, lot changing variables constraints make it challenging solve. Also, there no standard benchmark environment available can be used train RL agents for evacuation. A realistic design. In this article, we propose first fire planning. The modeled as graph capturing building structure. It consists features spread, uncertainty, bottlenecks. implementation our OpenAI gym format, facilitate future research. We also new approach entails pretraining network weights DQN-based agent [DQN/Double-DQN (DDQN)/Dueling-DQN] incorporate information on shortest path exit. achieved by using tabular $Q$ -learning learn model’s graph. This transferred deliberately overfitting -matrix. Then, pretrained DQN model trained generate optimal under time varying conditions due bottlenecks, uncertainty. perform comparisons proposed state-of-the-art algorithms DQN, DDQN, Dueling-DQN, PPO, VPG, state-action-reward-state-action (SARSA), actor–critic method, ACKTR. results show method able outperform models huge margin including original models. Finally, tested large real consisting 91 rooms, possibility move any other room, hence giving 8281 actions. order reduce action space, strategy involves one step simulation. That is, an importance vector added final output acts attention mechanism. Using strategy, space reduced 90.1%. manner, deal spaces. Hence, achieves near performance world environment.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Reinforcement Learning with Double Q-Learning

The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether this harms performance, and whether they can generally be prevented. In this paper, we answer all these questions affirmatively. In particular, we first show that the recent DQN algorithm, which combines Q-le...

متن کامل

Distributed Deep Q-Learning

We propose a distributed deep learning model to successfully learn control policies directly from highdimensional sensory input using reinforcement learning. The model is based on the deep Q-network, a convolutional neural network trained with a variant of Q-learning. Its input is raw pixels and its output is a value function estimating future rewards from taking an action given a system state....

متن کامل

Evaluating project’s completion time with Q-learning

Nowadays project management is a key component in introductory operations management. The educators and the researchers in these areas advocate representing a project as a network and applying the solution approaches for network models to them to assist project managers to monitor their completion. In this paper, we evaluated project’s completion time utilizing the Q-learning algorithm. So the ...

متن کامل

Evaluating project’s completion time with Q-learning

Nowadays project management is a key component in introductory operations management. The educators and the researchers in these areas advocate representing a project as a network and applying the solution approaches for network models to them to assist project managers to monitor their completion. In this paper, we evaluated project’s completion time utilizing the Q-learning algorithm. So the ...

متن کامل

Natural Gradient Deep Q-learning

This paper presents findings for training a Q-learning reinforcement learning agent using natural gradient techniques. We compare the original deep Q-network (DQN) algorithm to its natural gradient counterpart (NGDQN), measuring NGDQN and DQN performance on classic controls environments without target networks. We find that NGDQN performs favorably relative to DQN, converging to significantly b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE transactions on systems, man, and cybernetics

سال: 2021

ISSN: ['1083-4427', '1558-2426']

DOI: https://doi.org/10.1109/tsmc.2020.2967936